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MULTIPLE DECISION PROCEDURES FOR ANOVA OF TWO-LEVEL 


FACTORIAL FIXED-EFFECTS REPLICATION-FREE EXPERIMENTS* 
by Arthur G. Holms and J N. Berrettonl 1- 
Lewis Research Center 

SUMMARY 

For expensive areas of experimentation, such as alloy development, pressure vessel 
burst testing, and high -temperature protective coatings, the appropriate experiments 
consist of two-level fixed-effects factorial designs without replication. No adequate pro- 
cedures have been available for the statistical analysis of such experiments. 

A procedure called ’’chain pooling” is introduced for testing the significance of 
terms of a model equation as fitted to the observations from a fractional factorial ex- 
periment. The procedure starts with a small group containing only the smallest of the 
ordered squared coefficients of the model equation in the denominator of a test statistic. 

F epwise testing, in the increasing order of succeeding squared coefficients, pools 
. ^significant squares into the denominator of the statistic, which is used for continued 
testing. 

Monte Carlo computations were performed to determine the decision error prob- 
abilities for many different variations of chain pooling and to compare the relative ad- 
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vantages of the variations for fractional factorial experiments with 2 , 2 , and 2 treat- 
ment combinations. These computations were performed with values of the significant 
coefficients distributed in such a manner as to contribute to high probabilities of decision 
errors, so that the recommended procedures are good against the worst possible con- 
ditions. 

For any real experiment, the actual decision error probabilities depend on the mag- 
nitudes of the coefficients. A method is given for estimating upper limits of weighted 
average decision error probabilities after the coefficients have been estimated. The 

*This report is based on a dissertation submitted in partial fulfillment of the re- 
quirements for the degree of Doctor of Philosphy, at Western Reserve University, 
Cleveland, Ohio, June 1966. 

tChairman, Department of Statistics, Western Reserve University, Cleveland, Ohio. 



procedures are illustrated by an example from high -temperature -alloy development. 


INTRODUCTION 

The factorial experiment is useful for observing the response of a continuous de- 
pendent variable to changes in the independent variables. The independent variables 
may be continuous, discrete, or qualitative. The factorial experiment is preferred to 
other designs when certain combinations of levels of the independent variables can affect 
the response (interact). 

If independent variables x^, Xg, Xg, . . . are to be investigated at numbers of 
levels a, b, c, . . . , and if the error is to be measured with an r -fold replication, 
the full factorial experiment includes r • a* b* c . . . observations. A replicated 
full factorial experiment can be too expensive in such fields as alloy development, 
destructive tests of structures (liquid rocket fuel tank bursting), or where many vari- 
ables are involved (high -temperature protective coatings). 

Identifying those factors that affect the response directly or through interactions is 
done efficiently by performing the experiments at only two levels of each factor. A full 
factorial experiment on g factors then requires 2^ observations and provides esti- 
mates of the direct effects and all interactions. 

The interactions involving the larger numbers of factors are often anticipated to be 
negligible, and the experiments are then performed as fractional replicates. A frac- 
tion (1/2)* 1 of the full factorial experiment is performed. The number of observations 
is 


2 l = 2^ -h 


and i =g-h. 

The 2^ -h experiment is preferred when each experimental unit is costly but where 
any existing interactions should be discovered. The basic design could involve r2^~^ 
observations, where an r-fold replication is used to estimate the error variance. The 
economy achieved by not replicating (by setting r = 1) carries the penalty that there is 
no obvious, or prior, valid mean square for estimating the error variance, and an 
estimate of error variance is needed in selecting those effects that will be judged sig- 
nificant. 

If replication is lacking, a customary practice, according to Davies (ref. 1, p. 286). 
consists of pooling some arbitrary number of the highest order interaction mean squares 
into an estimate of error variance. However, if this practice is followed, any unknown 
block effects could inflate some of the pooled interactions and thereby give too large an 
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estimate of error variance. Too large an error estimate reduces the sensitivity of 
subsequent tests to detect real effects among the estimates of the main effects and lower 
order interactions. 

The preservation of sensitivity, when pooling mean squares into the estimate 
of error variance, has been an object of the procedure of Daniel (ref. 2) and of Wilk, 
Gnanadesikan, and Freeny (ref. 3). Daniel uses the absolute values of the effect esti- 
mates as order statistics. These values are plotted on probability paper and the result 
is called a half -normal plot. Such a display, combined with a background of experience, 
might provide a method by which a skillful user could pass judgment on the results of 
an experiment. Daniel concluded that the half -normal plot can be used to make judg- 
ments about the reality of the largest effects observed only if a small proportion of the 
effect estimates represent real effects. 

Birnbaum (ref. 4) investigated procedures related to half -normal plotting. His 
results on significance are limited to the single largest order statistic. He concluded, 
however, that such procedures are optimal with respect to the two largest order sta- 
tistics. 

For the 2^ experiment, and aside from the grand mean, there are 2^-1 mean 
squares requiring decisions as to significance. The procedure of Wilk, Gnanadesikan, 
and Freeny (ref. 3) requires that some subjective or prior knowledge be used to decide 
that 77 of the 2^-1 mean squares do not contain real effects and that, therefore, 
p = 2^ - 77 - 1 mean squares do contain real effects. As shown in reference 3, the pro- 
cedure is not robust against errors in guessing the value of 77 , and 77 must be guessed 
because it is an unknown in the problem. 

Daniel and Birnbaum have limited their results to experiments where only a small 
proportion of the effects are thought to be significant. On the other hand, situations can 
exist where the experimenter might design a two -level factorial experiment so that a 
large proportion of the effects will be significant. A particular example occurs in the 
development of superalloys. These alloys typically contain 5 to 15 elements. One proce- 
dur for finding an optimum composition is to use Box -Wilson techniques (ref. 1, p. 495). 
The first phase fits a first -degree response model to data gathered from a factorial 
experiment. The costs of experimenting and the need to investigate many elements 
imply that experiments should be fractionally replicated. Efficiency of the fractional 
design requires that most of the degrees of freedom (in fitting the linear model) be as- 
signed to the direct effects with only a few contrasts assigned to the interactions. The 
Box -Wilson techniques also imply that the experimenter will achieve conditions where the 
first -degree model is no longer valid. He needs a method for deciding to abandon the 
first -degree model. One such method regards those interactions that are evaluated as a 
sample of higher degree effects. If they give evidence of higher degree, the experimenter 
performs the more extensive experiments required by the second -degree model. (Se- 
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quences of blocked fractional designs, especially appropriate to proceeding in steps from 
the first -degree model to the second -degree model, were presented in ref. 5.) 

The observed effects from the two -level experiment are thus used for three pur- 
poses: 

(1) To estimate error variance 

(2) To evaluate main effects and test their significance 

(3) To evaluate interaction effects and test their significance 

In alloy development, the metallurgist often has enough prior knowledge to set com- 
position levels so that most of the main effects will be significant. Consequently, the 
testing of the sample of interactions must be based on an error variance estimate that 
comes from a small (but not predetermined) number of nonsignificant effects. Decision 
procedures are needed that will use a small conditional number of effects to estimate 
error variance. Such procedures were not provided by Daniel or Birnbaum. 

The method proposed herein called chain pooling, tests a major proportion of the 
mean squares in the order of increasing magnitude. Certain procedures will be pro- 
posed as being reasonable. Their properties will not be investigated analytically; how- 
ever, appropriate risk functions will be defined and several variations of the suggested 
procedures will be evaluated by Monte Carlo methods in terms of the risk functions. 

4 c r 

The investigation will be limited to experiments that are of 2 , 2 , and 2 fractional 
factorial design. 


SYMBOLS 


C n Cochran’ s statistic for largest of j mean squares 

E(. . . ) expectation of (. . . ) 
e single observation random error 

g number of factors in two -level experiment 

■L. 

h fractional replicate contains (l/2) n observations of full factorial experiment 

i subscript denoting order of computing mean squares according to Yates' 

algorithm; i = 0, 1, 2, . . . , 2 - 1 

j subscript denoting j th smallest mean square (exclusive of grand mean) 

j= 1, 2, . . ., 2 £ -1 


k subscript 

L(\., d) loss for decision d under parameter X. 
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experiment contains 2^ treatment combinations and produces that many 
observations 

number of mean squares pooled before testing begins 
number of Monte Carlo generated experiments 
) normal distribution with parameters (...) 
sample size 

11. 

observed probability of type 1 error when testing i in mean square 

observed probability of type 2 error when testing i ul mean square 

risk associated with type 1 errors 

risk associated with type 2 errors 

number of replications 

test statistic defined by eq. (8) 

) variance of (. . .) 

levels of independent variables 

response 

mean square 

test size 

nominal size of final significance test 
nominal size of preliminary pooling test 
type 2 error probability 

parameter determining relative magnitudes of real effects in any one 
experiment 

ij. o 

expectation of j order statistic of a variable 

number of mean squares having noncentrality parameter of zero 

estimate of tj 

scale parameter 

estimate of X 

average noncentrality parameter 

coefficients of eq. (1) that are estimated in Yates’ order from Yates’ contrasts 
estimate of 


p number of real effects 

p estimate of p 

X)(. • .) summation of (...) 
a standard deviation 

a estimate of a 

X (. . .) chi-square distribution with (...) degrees of freedom 
ijs detection efficiency defined by eq. (15) 

CHAIN POOLING 
Analysis of Variance Model 

Consider a 2^ experiment with 1 = 4. The factors can be qualitative or quantita- 
tive and are named Xj, x 2 , x 3 , and x 4 . Their levels are represented by +1 for the 
upper level and by -1 for the lower level. The model for the response is then written as 

Y = p 0 + PjXj + p 2 x 2 + ^3 x l x 2 + ^4 X 3 + ^5 X 1 X 3 + ^6 X 2 X 3 + |U 7 X 1 X 2 X 3 

+ Mg x 4 + Mg x i x 4 + M 10 x 2 X 4 + /J '11 X 1 X 2 X 4 + ^12 X 3 X 4 + M 13 X 1 X 3 X 4 

+ Mj4 x 2 x 3 x 4 + ^15 X 1 X 2 X 3 X 4 + e (1) 

where 

E(e) = 0; V(e) = a 2 

2 

and e is independently N(0, a ). 

The observations from the 2^ treatments are used to compute mean squares con- 
veniently by Yates’ method (ref . 1, p. 263). Assume that the mean squares have 
been computed in Yates' order and in this order are labeled Zq, Z^, . . ., Z n , where 
n = 2^ - 1. The coefficients of equation (1) are in Yates' order. The expectations of 
the Zj are 

E(Z.) = a 2 + 2 L i± 2 (2) 

i = 0, 1, 2, . . . , 2 s - - 1 
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The quantities Z./cr are either central chi-square or, more generally, noncentral 
chi-square variables that have 1 degree of freedom (ref. 6, p. 227). Let Xj be the 
noncentrality parameter. Then 


x i = 


2 \ 2 


(3) 


Risk Functions 

Assume that n single degree -of -freedom mean squares are drawn from n popula- 
tions. An unknown number p of the populations have real effects (X^ > 0) and the 
balance are null populations (Xj =0). A number p of the populations are to be selected 
with the hope that they will be the populations with \ > 0). 

Errors of selection are assumed to produce losses that depend on the parameters X^ 
and on the decision d as given by loss functions L(X., d). The loss for any correct 
decision is defined as zero. If the i in population is correctly decided to be null, the 
type 1 loss is 

L 1 (X i , d) = L 1 (X i =0, X. = 0) = 0 

If the i population is correctly decided to be nonnull, the type 2 loss is 

L 2 (X i , d) = L 2 (Xj >0, > 0) = 0 

If the i n population is incorrectly selected (null hypothesis incorrectly rejected), 
a type 1 unit loss is assumed: 


L^X., d) = L^Xj = 0, X t > 0) = 1 


(4) 


This definition equates the type 1 risk (expectation of type 1 loss) to the probability 
of a type 1 error. 

A dimensionless parameter 0 is defined by 



i=l 
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Let 


Then 




( 5 ) 


( 6 ) 


A long tradition exists for saying that losses in estimation are proportional to the 
square of the error. This tradition is followed by saying that if pj is incorrectly de- 
cided to be zero, the type 2 loss will be proportional to p?. The type 2 loss is made 
proportional to the square of the p^ (if not selected) by 


L 2^ x i» 


d) = 



•? 


(7) 


The type 2 risk (expectation of a type 2 loss) is therefore equal to the probability of 

2 2 

a type 2 error multiplied by a weighting factor , where the 6j is proportional to 
the square of the pj that was not selected, and where the weighting factors have a 
mean value of 1 for the set of p nonnull populations. 


Test Statistics 


As stated in reference 4, the optimal decision procedure for p ^ 1 uses a test 
developed by Cochran (ref. 7). The Z . in Yates' order (omitting the mean square for 
the grand mean) are ordered in nondecreasing magnitude as Z., 

J 


Z l sZ 2 s 


n 


Cochran's statistic is 


11 Z l + - • * + Z n 

and the null hypothesis is rejected with test size a if C fl exceeds the upper 100a - 
percent point of Cochran’s distribution, which has been additionally tabulated in ref- 
erence 8. 
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Assume that the X^ with Yates' order are written as Xj in nondecreasing order of 
the mean squares. Rejection with Cochran's statistic suggests that X n > 0 o If X Q > 0 
is true, then X Q _j might be tested with the statistic 


'n-1 


n-1 

Z l + z 2 + • • • + z n -l 


Suppose X n-1 = 0 is rejected. This suggests that the test of X n had very low 
sensitivity because of the inflated denominator. Obviously, a general multiple decision 
procedure cannot be developed using Cochran's test in the descending order of the mean 
squares. In ascending order, let 


C 2 “ 


Z 1 + Z 2 


and if the test based on this statistic accepts X£ = 0, use the conclusion as an assump- 
tion and form 


C 3 = 

Zj + Z2 + Zg 

In other words, assume that the smallest mean square has been drawn from a 
population with Xj = 0. Proceed stepwise with test statistics Cj so long as Cj_j 
indicates Xj ^ = 0. At the first rejection of the null hypothesis (e. g. , for Cj), conclude 
that Xj > 0, and because of the ordering, immediately conclude that X^ > 0 for all 
k ^ j. Thus, Cochran's test has been generalized to a sequence of dependent tests; 
furthermore, the number of items that should be in the denominator is always unknown. 
For these reasons the nominal a of Cochran's test will not be the true size for the 
multiple procedure. 

A trivial transformation of Cochran's statistic provides an alternative statistic. 

It is 



jZ j 


Z, + 


+ z 4 


3 = 2, 


n 


( 8 ) 
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The basic process of chain pooling begins with X^ = 0 and j = 2. At each stage of 
the testing, the composite null hypothesis 

~ > ’ • • > ~ — — 0< Xj^j, . . . ) X^ 


is tested against the composite alternative hypothesis 


X-i = , • 


= Xj _1 = °; 0 < X 


■j’ 


X j+1’ 


» X n 


The null hypothesis is rejected at the first j for which Uj exceeds the tabulated 100a- 
percent point (table I) and the immediate conclusion is 

0<X j , . j x^ 

Partly analogous to the method of Wilk et al. (ref. 3) some small number m ^ 1 
of the smallest ordered mean squares are assumed to have been drawn from null popula- 
tions so that testing begins with j = m + 1. Testing begins with the critical Uj values 
corresponding to a large nominal significance level a p , where typically 0.25 ^ 

a p ~ 0* 

The special case of a p = 1. 0 means that the number of items in the denominator 
remains fixed at m + 1, and testing proceeds at the final nominal level a f , where 
typically 0. 001 < a^ ^ 0. 05. The test statistic for the j** 1 mean square is then 


U. 


(m + 1)Z 


m+1 


1 


Z, + 


+ Z m + Z j 


(9) 


If a p < 1. 0, the tests (with the use of eq. (8)) at nominal level a p are called 
preliminary tests, and they continue so long as nonsignificance is the result. All Zj 
testing nonsignificant remain in the denominator. This procedure is analogous to the 
familiar ’’sometimes pool” procedures that have been used in two-stage testing. Be- 
cause o' is large, early in the chain some Zi would be selected as being too large 
P J 

to be an obvious member of a population with Xj = 0. On the other hand, because a p 
is large, there should be no great confidence that Zj was drawn from a population 
with Xj > 0. Therefore, a new level a f < a p is imposed. If a Zj is significant at 
level a p , the same Zj is tested at level a f , also with the use of the Uj statistic. 

If Zj is concluded to be significant at level a f , then all Z fc (k a j) are concluded to be 
significant. 


10 




TABLE I. - UPPER 100a PERCENT POINTS OF TEST STATISTIC U- 





Nominal test size, a 





0.001 

0. 002 

0.005 

0.01 

0.025 

0.05 

0.10 

0.25 

0. 50 

0.75 

2. 00000 

1.99999 

1.99997 

1. 99986 

1.99917 

1.99687 

1.9877 

1.923 

1.706 

1.382 

2.9976 

2.9960 

2. 9904 

2.9809 

2.951 

2.904 

2.806 

2.527 

2.086 

1.688 

3.976 

3.962 

3.925 

3.870 

3.760 

3.625 

3.412 

2.949 

2.395 

1.961 

4.887 

4.845 

4.758 

4.65 

4.44 

4.21 

3.89 

3.287 

2.658 

2. 184 

5. 74 

5.63 

5.46 

5.31 

4.99 

4.68 

4.28 

3.57 

2.893 

2.371 

6. 51 

6.33 

6. 11 

5.87 

5.46 

5.09 

4.61 

3.83 

3. 11 

2.54 

7.20 

6.96 

6.65 

6.35 

5. 88 

5.44 

4.91 

4.06 

3.29 

2.69 

7.81 

7.52 

7. 10 

6.78 

6. 26 

5.75 

5. 17 

4.27 

3.45 

2.82 

8.34 

8. 01 

7. 53 

7. 17 

6. 59 

6.03 

5.41 

4.45 

3.60 

2.95 

8.82 

8.44 

7.95 

7. 53 

6.89 

6.28 

5.61 

4.62 

3.74 

3.07 

9.26 

8.84 

8.33 

7.87 

7. 13 

6. 50 

5.81 

4.77 

3.87 

3. 17 

9.67 

9.21 

8.68 

8.16 

7.37 

6.71 

5.99 

4.92 

3.99 

3.27 

10.05 

9. 55 

8.95 

8.42 

7. 59 

6.91 

6. 15 

5.05 

4. 10 

3.37 

10.40 

9.86 

9.20 

8.66 

7. 79 

7.07 

6.30 

5.17 

4.20 

3.46 

10. 72 

10. 14 

9.43 

8.83 

7.96 

7.23 

6.44 

5.29 

4.30 

3.55 

11.01 

10. 40 

9.64 

9.00 

8. 12 

7.38 

6.57 

5.40 

4.39 

3.63 

11.28 

10. 64 

9.84 

9. 17 

8. 28 

7. 52 

6.69 

5.50 

4.48 

3.70 

11. 53 

10.86 

10. 03 

9.34 

8.43 

7.65 

6.81 

5.60 

4. 56 

3.77 

11.76 

11.07 

10.-22 

9. 51 

8. 58 

7.78 

6.92 

5.69 

4.64 

3.84 

11.98 

11.28 

10.40 

9.67 

8. 72 

7.90 

7.03 

5. 78 

4.71 

3.90 

12. 19 

11.48 

10. 58 

9.83 

8. 86 

8.02 

7. 13 

5.87 

4. 78 

3.96 

12. 39 

11. 68 

10.76 

9.99 

8.99 

8. 13 

7.23 

5.95 

4.85 

4.02 

12. 58 

11.87 

10.93 

10. 14 

9. 12 

8.24 

7.33 

6.03 

4.92 

4.08 

12. 76 

12.05 

11. 10 

10.29 

9. 23 

8.34 

7.42 

6. 11 

4.98 

4. 14 

12.93 

12. 22 

11.26 

10.43 

9. 34 

8.44 

7. 51 

6. 18 

5.04 

4. 19 

13.09 

12, 38 

11.41 

10. 56 

9.44 

8. 54 

7.60 

6.25 

5. 10 

4.24 

13.24 

12. 53 

11. 55 

10.68 

9. 54 

8.63 

7.68 

6.32 

5. 16 

4.30 

13. 39 

12. 68 

11. 68 

10.78 

9.64 

8.72 

7.76 

6.38 

5.22 

4.35 

13. 53 

12.82 

11.80 

10.88 

9. 74 

8.81 

7.83 

6.44 

5.28 

4.40 

13.67 

12.96 

11.91 

10.98 

9. 83 

8.89 

7.90 

6. 50 

5.33 

4.45 

13.80 

13.09 

12.01 

11.07 

9.91 

8.97 

7.97 

6. 56 

5.38 

4. 50 

13.93 

13.21 

12. 10 

11. 16 

9.99 

9. 04 

8.04 

6.62 

5.43 

4. 54 

14.05 

13.32 

12. 19 

11.25 

10. 07 

9. 11 

8. 11 

6.68 

5.48 

4. 58 

14. 17 

13.43 

12.27 

11.34 

10. 15 

9. 18 

8. 17 

6.74 

5. 53 

4.62 


TABLE I. - Concluded. UPPER 100c* PERCENT POINTS OF TEST STATISTIC Uj 


Number of 




Nominal test size, a 





denomina- 
tor mean 

0.001 

0.002 

0.005 

0.01 

0.025 

0. 05 

0. 10 

0.25 

0. 50 

0.75 

squares, 

j 











36 

14.29 

13.53 

12.35 

11.43 

10.22 

9.25 

8.23 

6.80 

5. 58 

4.66 

37 

14.41 

13.63 

12.43 

11.51 

10.29 

9.31 

8.29 

6.85 

5.63 

4.70 

38 

14.53 

13.73 

12.51 

11.59 

10.36 

9.37 

8.35 

6.90 

5.67 

4.74 

39 

14.64 

13.82 

12.59 

11.67 

10. 43 

9.43 

8.41 

6.95 

5.71 

4.78 

40 

14.75 

13.91 

12.67 

11.75 

10. 50 

9.49 

8.46 

6.99 

5.75 

4.82 

41 

14.85 

14.00 

12.75 

11.83 

10. 57 

9.55 

8.51 

7.03 

5.79 

4.86 

42 

14.95 

14.09 

12.83 

11.90 

10.64 

9.61 

8.56 

7.07 

5. 83 

4.90 

43 

15. 05 

14.17 

12.90 

11.97 

10.70 

9.67 

8.61 

7.11 

5.87 

4.94 

44 

15. 15 

14.25 

12.97 

12.04 

10.76 

9.72 

8.66 

7.15 

5.91 

4.98 

45 

15. 24 

14.33 

13.05 

12. 11 

10.82 

9.77 

8.71 

7. 19 

5.95 

5. 01 

46 

15.33 

14.40 

13. 12 

12.18 

10.88 

9.82 

8.76 

7.23 

5. 99 

5.04 

47 

15.42 

14.47 

13.19 

12.25 

10.94 

9.87 

8.81 

7.27 

6.03 

5.07 

48 

15.50 

14.54 

13.26 

12.32 

11.00 

9.92 

8.85 

7.31 

6.07 

5.10 

49 

15. 58 

14.60 

13.32 

12.38 

11.06 

9.97 

8.89 

7.35 

6.11 

5. 13 

50 

15. 66 

14.66 

13.38 

12.44 

11.11 

10. 02 

8.93 

7.39 

6. 14 

5. 16 

51 

15.73 

14.72 

13.44 

12.50 

11.16 

10. 07 

8.97 

7.43 

6. 17 

5.19 

52 

15.80 

14.79 

13.50 

12.56 

11.21 

10. 12 

9.01 

7.47 

6.20 

5.22 

53 

15.87 

14.85 

13.56 

12.62 

11.26 

10. 17 

9.05 

7.51 

6.23 

5.25 

54 

15.93 

14.91 

13.62 

12.68 

11.31 

10.21 

9.09 

7.55 

6.26 

5.28 

55 

15.99 

14.97 

13.67 

12.73 

11.36 

10.25 

9. 13 

7.59 

6.29 

5.31 

56 

16.05 

15.03 

13.72 

12.78 

11.40 

10. 29 

9. 17 

7.63 

6.32 

5.34 

57 

16. 11 

15. 10 

13.77 

12.83 

11.44 

10. 33 

9.21 

7.67 

6.35 

5.37 

58 

16.17 

15. 16 

13.82 

12.88 

11.48 

10.37 

9.25 

7.70 

6.38 

5.40 

59 

16.23 

15.22 

13.87 

12.93 

11. 52 

10.41 

9.29 

7. 73 

6.41 

5.43 

60 

16.29 

15.28 

13.92 

12.97 

11.56 

10.45 

9.33 

7.76 

6.44 

5.46 

61 

16.34 

15.34 

13.97 

13.01 

11.60 

10.49 

9.37 

7.79 

6.47 

5.48 

62 

16.39 

15.40 

14.02 

13.05 

11.64 

10.53 

9.41 

7.82 

6.50 

5. 50 

63 

16.44 

15.46 

14.06 

13.09 

11.67 

10. 57 

9.45 

7.85 

6. 53 

5.52 
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Suppose Z. is significant at level a^, but not at level a^. Continue testing 
Z j+1 , . . . until some is reached that tests significant at a f , where only the 
first j - 1 mean squares were judged to be from null populations: 

j Z k , . 

U. = = (10) 

Z ! + •••+ Z j_l + Z k 

If some Z^ is significant at level cip all larger mean squares are also concluded to 
be significant. 


Computation of Uj Distribution 

The chain pooling procedures require nominal significance levels additional to 
those given in reference 8. Critical points of U. were obtained with the use of a Monte 

J 

TABLE H. - COMPARISON OF PERCENTAGE POINTS OF STATISTIC Uj 
OBTAINED FROM MONTE CARLO COMPUTATIONS WITH PERCENTAGE 
POINTS OF Uj COMPUTED FROM TABLES OF COCHRAN'S STATISTIC 21 


Number of 



Nominal test size, ot 


denomina- 
tor mean 
squares, 


0.01 


0. 05 

Critical point, Uj 

Difference 

Test statistic, 

Difference 

j 

Cochran 

Monte Carlo 


Cochran 

Monte Carlo 


2 

1.9998 

1.99986 

0. 0001 

1. 9970 

1. 99687 

-0. 0001 

3 

2. 9799 

2.9809 

.0010 

2. 9007 

2.904 

.003 

4 

3. 8704 

3.870 

.000 

3. 6260 

3.625 

-.001 

5 

4.6395 

4.65 

.01 

4. 2060 

4.21 

.00 

6 

5. 2968 

5.31 

.01 

4. 6848 

4.68 

.00 

7 

5.8632 

5. 87 

.01 

5. 0897 

5.09 

.00 

8 

6.3560 

6.35 

-.01 

5. 4384 

5.44 

.00 

9 

6. 7896 

6.78 

-.01 

5. 7465 

5.75 

.00 

10 

7. 1750 

7. 17 

-.005 

6. 0200 

6.03 

.01 

12 

7. 8336 

7. 87 

.04 

6. 4920 

6.50 

.01 

15 

8. 6205 

8. 66 

.04 

7. 0635 

7.07 

.01 

20 

9. 5980 

9.51 

-.09 

7. 7880 

7.78 

-. 01 

24 

10. 1928 

10. 14 

-. 05 

8. 2416 

8.24 

.00 

30 

10. 8960 

10. 88 

-. 02 

8. 7870 

8. 81 

.02 

40 

11.7600 

11.75 

-. 01 

9. 4800 

9.49 

.01 

60 

12.9060 

12.97 

.06 

10.4220 

10.45 

.03 


a Ref. 8. 
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Carlo method, as described in reference 9, and are given in table I. These points are 
compared with Cochran’s statistic in table II, and the comparison suggests that the re- 
sults are in agreement to better than one unit in the next to the last figure reported. 


EVALUATION OF CHAIN POOLING PROCEDURES 
Risk Curves 


The relative importance of type 1 and type 2 errors depends on the application 
so that a general evaluation of a decision procedure cannot combine the type 1 and 
type 2 losses. A procedure is usually chosen that involves nonzero probabilities of 
both types of errors, but the losses associated with one or the other type may dictate 
that a strong effort be made to control just one of the two types. A compromise is 
chosen where a lowered probability of one type of error will increase the probability 
of the other. An evaluation of the operating characteristics of a multiple decision pro- 
cedure must therefore exhibit the set of possible compromises. 

The operating characteristics of the chain pooling procedures will be determined 
by Monte Carlo methods. The results, therefore, will show the relative frequency with 
which type 1 and type 2 decision errors have been made. Multiplying the observed 
relative frequencies by the type 1 and type 2 losses, as given in equations (4) and (7), 
produces quantities that will be called the observed type 1 and type 2 risks. 

For a given procedure, a set of choices (such as differing values of nominal a) 
could produce a set of pairs of risk values R^ and R 2 as indicated, for a hypothetical 


1.0 


0 1.0 

Type 1 risk, 

Figure 1. - Considerations in evaluation of pooling pro- 
cedures. 
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example, by the set of points consisting of curve A of figure 1. On the other hand, a 
different procedure could lead to a set of risk values such as those of curve B. Over 
the range C for any given R^, curve B has lower R 2 than curve A or for any given R 2 , 
curve B has lower R^ than curve A. The procedure resulting in curve B, therefore, 
is a preferred procedure over the range C with respect to the procedure that gave 
curve A. 


Parameters of Risk Curves 

A digital computer (IBM 7094) was used to generate pseudonormal random variables, 
as described in reference 9. For 2^ treatments, 2^ such numbers were used with 
Yates’ algorithm to compute mean squares. The use of Yates' algorithm with pseudo- 
normal variates gives contrasts that are the sum of 2^ approximately normal variates. 

The central limit theorem therefore implies that an improved approximation to nor- 

2 2 

mality was obtained, over what would have been obtained if pseudo a variates had 
been generated more directly. 

Suppose that the contrasts computed with Yates’ algorithm are listed in their order 
of computation. As in typical experiments with real data, the first mean square for 
total or grand mean is excluded from further consideration. Each subsequent contrast 
is augmented by the addition of an increment, 2^6^0a, for i = 1, . . ., p; ps 2^-2. 

In terms of equation (5), the increment is 2 £ /j^; that is, the parameter estimated by 
the contrast has been given the value p^. The mean squares, therefore, have noncen- 
trality parameters, as given by equation (3). 

In the analysis of a physical experiment, the two types of parameters are the un- 
known parameters of the populations as chosen by ’’nature” and the parameters of the 

o 

ANOVA strategy as chosen by the statistician. The unknown parameters are p, or , 
and Xjj i = 1, . . . , p. The parameters assigned by the statistician in the case of 
chain pooling are m, oip, and a^. The generation of Monte Carlo experiments requires 
that values must also be assigned to p, , and Ap Because the procedure is scale 
invariant, the investigation can be simplified to cfi = 1. 

The values of p and Aj are to be chosen so that they will impose a severe burden 
on the available strategies. Application of a wide variety of strategies should then pro- 
vide a demarcation of the superior strategies. The strategy evaluations should be 
based on combinations of Aj values that are especially unfavorable with respect to 
type 2 errors. A complete investigation of the operating characteristics of the multiple 
decision procedures would evaluate p type 2 error probabilities as joint functions of 
the A-p Ag, . . ., A . Such an investigation would not be readily interpretable. The 
problem is simplified by defining a type 2 risk as the expectation, over the experiment, 
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of losses due to type 2 errors. Obviously, the type 2 risk will be sensitive to the dis- 
tribution of the Xj over i = 1, . . . , p. The scope of the investigation is reduced to 
manageable size by considering only such distributions of the A i as should result in 
especially unfavorable operating characteristics. 


Unfavorable Distribution of Parameters 

2 2 

If all the Zj had come from one central ^^(l) distribution, the correct decisions 

would be that all A, = 0. The order statistics Z. would have some set of expectations 
2 1 J 

0Q?j* On the other hand, suppose that the Zj were drawn from n noncentral chi- 

square populations, each having a noncentrality parameter such that 

(1 + AjJo- 2 = OqCj; i, j = 1, . . . , n 

The resulting mean squares would thus have expectations equal to the expectations of 

2 2 

the order statistics of the central ctqX^-q distribution. This equating of expectations 
of mean squares from n noncentral populations to the expectations of the ordered ob- 
servations of the single central population implies that, for the noncentral populations, 

the multiple decision procedure might have selection probabilities (power functions) no 

2 

larger than the type 1 error probabilities for the central OqX(i) population. Such a 
set of Aj values would therefore constitute a distribution that would be unfavorable to 
the selection probabilities under chain pooling. 

From equations (3) and (5), 


2^ll 2 

A. = = 2 W 


Therefore, an unfavorable distribution of the A. can be obtained by setting 


- Wi 


where the are the expectations of the order statistics of a sample of size p from 
the central distribution. (These values of are such that 6^ = Cp_j + i will 

satisfy eq. (6)). Expectations of order statistics from a gamma distribution with 
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scale parameter 1, shape parameter 1/2, and many sample sizes were tabulated in 
reference 10. Multiplying these values by 2 gives the expectations of the order sta- 

p 

tistics of the central x.(-q distribution. The Monte Carlo experiments were performed 
with such unfavorable sets of Aj values except that the number p of A. # 0 was less 
than n. 

In general, efficient experimentation is achieved when 77 is small in comparison 
with p, and this is achieved when the experimenter uses his prior knowledge to choose 
that fractional factorial design that results in most of the mean squares being significant. 
Correspondingly, values of tj of 4, 6, and 9 were investigated for 2^ treatment com- 
binations; values of 6, 9, and 14 were investigated for the 2 5 case; and values of 13, 

23, and 33 were investigated for the 2 case. All the distributions of the Aj were ob- 
tained from reference 10 for values of p = 6 to p = 40. For the single case of p = 50, 
the distribution of Aj was obtained with the use of an approximation to as described 
in reference 9. 

The fixing of the distribution of the A^ allows the type 2 risk to be investigated as 
a function of a single parameter A, where A is the mean of the Aj over i = 1, . . ., p: 


x 4 

P i=l 


(ID 


From equations (3), (5), (6), and (11), 


A = A 
P 


' 2 V 
1 d H 



i=l 


. = ±£2 Ve 2 

P i=l 


= 2 * 0 2 


( 12 ) 


Mini max and Bayes Strategies 

Monte Carlo investigations to select from many strategies (m, Op, a j) the strategy 
best against a very unfavorable distribution of the A. should result in what might be 
called an "empirical minimax procedure. " The problem may also be approached from 
a Bayesian point of view. It would require the assumption of a prior probability dis- 
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tribution of the effects p^. The in an experimental situation might be the additive 

result of several diverse sources. The Pj could then be regarded as a sample of 

size p from an approximate prior normal distribution with mean zero, and in this 

o 

situation it was assumed that nature is a disinterested opponent. The 5 - would then 

2 3 

be the order statistics of a sample of size p from a distribution. In other 

words, if nature is a disinterested opponent, the same prior distribution of the 6 ^ 
should occur as would be anticipated from an aggressive opponent. The Monte Carlo 
founded minimax procedure is then also the Monte Carlo founded Bayes’ procedure. 


Scorekeeping for Monte Carlo Experiments 


After being augmented, the aye ordered in ascending rank, and the m smallest 
Zj are presumed to have been drawn from null populations. The next 2^ - 1 - m mean 
squares are examined for significance in accordance with equations (9) or (10). A type 1 
error is counted for the test of Z ^ if both ( 1 ) the test of Zj resulted in rejection of the 
null hypothesis, and ( 2 ) the particular Z^ is a mean square that had not been augmented. 
A type 2 error is counted for the test of Z^ if both (1) the test of Z^ resulted in accept- 
ance of the null hypothesis, and ( 2 ) the particular Zj is a mean square that had been 
augmented with 6 ^ > 0. In this way, N experiments are analyzed, each containing 
p violations of the null hypothesis. In all cases, N = 1000 and for given £, the same 
( 1000 ) 2 ^ pseudonormal variates were used for every strategy investigated. 

The mean squares in Yates’ order from i = p + 1 to i=2^-l were not augmented. 
For these mean squares, and over the N experiments, the computer counts the number 

of type 1 errors and divides by N to report the observed type 1 error probability P 1 . 

fh ^ 

for the i mean square. The number of P. . computations in any experiment is 

f 

tj = 2 - p - 1; however, not all r\ + p mean squares were tested. The m smallest 

mean squares were pooled before testing, and, therefore, only r\ - m opportunities 
should be expected for making type 1 errors. 

As given in equation (4), the type 1 errors are defined as unit losses. The observed 
risk (av type 1 loss) is estimated by P li as averaged over the experiment: 


R itn) 


2^-1 


— y 

q - m / j 


p n= 


i=p+l 


SL 1 ' P li 

2 - p - m - 1 i= p+ i 


2^-1 

Z 


(13) 


The symbol 77 is attached to to represent a fact that will be developed later; that 
is, Rj is mainly a function of 77 . 
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Equation (13) does not provide a strict probability. In any experiment, random 
fluctuations can cause some of the p augmented mean squares to have smaller values 
than some of the 77 mean squares not so augmented. If the strategy used a large value 
of m, some of the smallest augmented mean squares could be pooled into the initial 
denominator of the test statistic, and more than 77 - m of the null mean squares could 
be available for type 1 errors, which could result in R^(tj) being greater than 1 . 

The first p mean squares beyond the grand mean were augmented so that Xj > 0. 
The number of type 2 errors for each i = 1, . . . , p is counted over the N experi- 
ments and divided by N to report the observed type 2 error probability P 0 . for the 
th 

i n augmented mean square. Multiplication of these probabilities by the losses, as 
given in equation ( 7 ), allows the observed risk to be computed over the experiment. 
Thus, 


R a< x > = ?i: s f p 2i < 14 > 

i=l 

The symbol X is attached to Rg to indicate a fact that will be developed later; namely, 
Rg is mainly a function of X. 

Details of the decision and scorekeeping procedure as it was built into a computer 
program are given in reference 9. 


CHOICE OF PRELIMINARY TEST LEVEL 
Conditions of m, p, and A 

Those values of that should be preferred for a wide variety of conditions will 
now be evaluated for several arbitrarily chosen values of m. A later step will deter- 
mine preferred values of m, given that the a are already preferred values. 

Jr 

The values of p will be much larger than the value of p = 1 in Daniel’s investi- 
gation (ref. 2). (Comparisons of chain pooling with the operating characteristics of 
procedures based on half -normal plotting were presented in ref. 9, in which the chain 
pooling procedures were shown to be superior to the results obtained by Daniel and by 
Birnbaum . ) 

Values of X were chosen to result in R 2 (X) values that cover the range of 
0. 05 < R 2 (X) < 0. 20. 
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Experiments of 2^ Treatment Combinations 

Some results for S. = 4, p = 6 , ij = 9, and m = 1 are shown in figure 2. The 
strategies consisted of m = 1 together with the values of identified by the symbols 



m ■ L 


and the values of a £ that identify the solid curves. The set of preferred strategies 
(the set of points nearest the origin) is the set that jointly minimizes Rfo) and RgfX). 
These points are identified by the dashed curve drawn through them, and they include 
o!p = 0. 25 at the smaller values of Rj(t?), and Op = 0. 50 at the larger values of Rj ( 77 ). 
Similar results for m = 2 and m = 3 are presented in reference 9. 

An important implication of figure 2 is that a single value of cannot be 
preferred for all values of a However, on scanning the values of that are 
identified as being preferred by their lying on the dashed curve, selections can be made 
of those values of o!p that should be preferred for given values of the abscissa R^(^). 
For example, figure 2 shows that the vertical line at R^(?]) = 0. 05 cuts the dashed curve 
between two points for which afp = 0. 25, while at R ^( 77 ) = 0. 30, dtp = 0. 50 would be 
preferred. In some cases (such as R^(rj) = 0. 20 of fig. 2) there is no clear choice 
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TABLE HI. - PREFERRED NOMINAL SIZE OF PREUMINARY POOLING TEST a p 

(a) 2^ Treatment combinations 


Number of mean 

Average noncen- 

Type 1 risk, 

Number of null mean squares, rf 

squares pooled 

trality param- 

R x (»7) 

4 

6 

9 

before testing, 

eter, 


. 



m 

X 


Nominal size of preliminary pooling test, a p 

1 

64 

0.05 

a 0. 25 0.50 

0.25 

0.25 


81 

I 

a . 25 .50 

.25 

— 


100 

* 

a . 25 .50 

.25 

— 


64 

.10 

.50 

.25, 0.50 

.25 


81 

| 

.50 

.25, .50 

.25 


100 

* 

.50 

.25, .50 

.25 


64 

.20 

.50 

.50 

.25, a 0. 50 


81 

1 

.50 

.50 

.25, a . 50 


100 

* 

.50 

.50 

.25, a . 50 

2 

64 

0. 05 

a 0. 50 0.75 

0. 50 

0. 50 



.10 

.75 

. 50 a 0. 75 

. 50 a 0. 75 



.20 

a . 75 1. 00 

.75 

.75 

3 

64 

0. 05 

0.75 

0.75 

0.75 



.10 

a . 75 1. 00 

a . 75 1.00 

.75 



.20 

1.00 

1.00 

.75 a 1.00 


(b) 2® Treatment combinations 


Number of mean 

Average noncen- 

Type 1 risk, 

Number of null mean squares, t) 

squares pooled 

trality param- 

14 ( 7 ) 

6 

1 

9 


14 1 

before testing, 
m 

eter, 

X 


Nominal size of preliminary pooling test, af p 

1 

64 

0. 05 

0.50 

0. 50 

0.25 


64 

.10 

.50 



.25 


81 

.10 

--- 

- 



— 


49 

.20 







. 50 


64 

1 

. 50 



.50 


81 

T 





— 

3 

16 

0.05 

0.75 


•- 



64 

.05 

.75 

0.75 

0.75 


16 

. 10 

.75 

— - 

- 

— 


64 ! 

.10 

.75 

.75 

.75 


16 

.20 

1.00 

— 

-- 

— 


64 

.20 

1.00 

1.00 

. 75 a i. 00 

5 

64 

0.05 

1. 00 

1.00 

1.00 


16 

.10 



— 

"" 

— 


64 

.10 



1. 00 

1.00 


16 

.20 



— 

-- 

— 


64 

.20 



1.00 

1. 00 


a Use of this value makes ot p independent of r; at given m. • 
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between two values of Q!p (o^ = 0. 25 and ct!p = 0. 50 are equally preferred). Results 
similar to those for figure 2 are shown for 77 = 6 and for rj = 4 in reference 9. 

The preferred or equally preferred values of a p , as obtained for the conditions 
detailed in reference 9, are summarized in table m(a). This summary shows that the 
preferred values of are completely independent of the noncentrality parameter X 
and that the preferred can be selected to be independent of the number of null 
effects tj. 


Experiments of 2^ Treatment Combinations 

The type of evaluation of chain pooling strategies just described for the case of 
£ = 4 was also carried out for £ = 5. Results are presented in reference 9. The im- 
plications of these results for preferred values of otp are given in table m(b). This 
table shows again that the preferred values of Q!p are independent of the average non- 
centrality parameter X and that they are almost independent of the number of null mean 
squares 77 . 


Experiments of 2 6 Treatment Combinations 

Chain pooling strategies for £ = 6 were investigated at values of p = 50, 40, and 
30, and the corresponding values of 77 were 13, 23, and 33. Results with m = 1 and 
two values of X are shown by figures 3(a), 3(b), and 3(c), respectively. The results 
for 77 = 13 and for 77 = 23 (figs. 3(a) and 3(b)) show that with m = 1, the preferred 
value of ofp is 1. 0, as shown by the dashed line. A preferred value of atp = 1.0 is 
also shown with 77 = 33 by figure 3(c) for X = 64, but figure 3(c) shows that other values 
of Q!p are preferred when X = 256. The desirability of using small values of Q!p, 
shown by figure 3(c) at large 77 and X is a nontypical phenomenon discussed in detail 
in reference 9. In brief, if, first, the value of 77 is large relative to p, and if, second, 
77 is large on an absolute basis, and if, third, X is large enough to ensure low type 2 
losses, the preferred value of dp can be quite small. Such a favorable combination of 
p, 77 , and X would not ordinarily be known to exist, a priori, and the use of small 
values of a p cannot be regarded as a generally useful strategy. However, if the Monte 
Carlo investigations show that two values of a!p are equally preferred for the especially 
unfavorable distributions of 6p the smaller of the two a p values could be used for the 
ANOVA of real experiments on the chance that the might have a more favorable dis- 
tribution. (Some equally preferred pairs of ffp values are listed in table HI. ) 

The use of values of m > 1 with f = 6 is illustrated by figure 4 for m = 3. These 
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Type 1 risk, R^) 

(b) p - 40, r? - 23. 

re 3. - Risk curves. 1=6 and m = l. 


(c) p - 30, t? » 33. 










Figure 4. - Risk curves, t - 6, X - 64, p - 40, 17 = 23, 
and m = 3. 

results show that the preferred value of a p is 1. 0, but that for the value of X = 64 
(which gives reasonable values of RgW) there were no values of small enough to 
give values of Rj(rj) as small as might reasonably be desired. 


NUMBER OF MEAN SQUARES INITIALLY POOLED 

The results that give preferred values of Q!p for arbitrarily chosen values of m 
have been presented. Now, the results that will determine the preferred values of m 
are presented for values of Q!p that are already preferred. 

Experiments of 2^ Treatment Combinations 

Some risk curves with tj = 4 are presented in reference 9. These results showed 
preferred values of ap (which were independent of X) for m = 1, 2, and 3. These 
results are displayed in figure 5(a) but only for the single value of the noncentrality 
parameter X = 64 (which gives reasonable values of RgW). The curves drawn through 
the preferred of p points of figure 5(a) show that m = 2 or m = 3 is greatly preferred 
over m = 1, but that m = 3 is only a slight improvement over m = 2. 
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(a) p * 11, t } - 4, X - 64. 


Type 1 risk, 
(b) p * 9, 7 } « 1 
Figured - Riskc 


Constant 

m 












At a larger value of 77 , the risk curves for 77 = 6 are shown in figure 5(b) for pre- 
ferred values of Q!p and for values of m of 1, 2, 3, 4, and 5. The strategy of m = 4 
is seen to be preferred over the important range of 0. 05 ^ R^(t 7 ) — 0. 20. 

Results for 77 = 9 are shown in figure 5(c) for preferred values of oip, and for 
values of m of 4, 6, and 8. The strategy of m = 6 is seen to be preferred over the 
important range of 0 . 05 ^ R -^( 77 ) =£ 0 . 20. 

Experiments of 2^ Treatment Combinations 

Results with 2^ treatments for X = 32, 77 = 6, and m = 3, 4, and 5, are shown in 
figure 6(a) for preferred values of Q!p. In the important range of 0. 05 ^ R^(t?) — 0. 20, 
these results show that m = 3 is the preferred strategy. Similar results for 77 = 9 are 
shown in figure 6(b), and these results show m = 5 or m = 6 to be a preferred strat- 
egy. For 77 = 14, figure 6(c) shows that m = 7 is the best strategy over the range of 
0. 05 < R x (?7) < 0. 20. 


Experiments of 2 6 Treatment Combinations 

The results achieved with SL = 6 and with m = 1 and 3 (figs. 3 and 4) showed that 
Q!p = 1. 00 was a preferred strategy with m = 1 but, that for 77 = 23, there were no 
suitable Op and values with m = 3. The large value of 77 = 23 suggests that suit- 
able values of ap and a ^ might be found at values of m much larger than m = 3. 
Results for X = 64 and for Q!p = 1. 00 and 0. 75 are shown for a wide variety of values 
of m by figure 7(a). These computations were performed by using the smallest avail- 
able value of atp namely, = 0. 001, because the results in figure 4 show that achiev- 
ing values of R^(? 7 ) in the (desired) range down to 0. 05 would be very difficult with 
m > 1. 

The points displayed in figure 7(a) show that the strategies with Q!p = 1. 00 are 
better than those with «p = 0. 75. 

With the use of the preferred value of a!p = 1. 00, as indicated by figure 7(a), and 
a preferred set of m values as suggested by the lower branch of the curve of fig- 
ure 7 (a) (namely, m = 5 , 10 , 15, 20 , and 21 ), the question of preferred a ^ values was 
examined for the conditions of figure 7(b). The results presented in figure 7(b) show 
that, in any attempt to control R^(tj) in the range of 0. 05 to 0. 20, the preferred strat- 
egy consists of using a ^ at its smallest available value (namely, = 0. 001). The 
control of R^Tf) is then accomplished by the selection of a suitable value of m. 

The case of p = 50 and 77 = 13 was examined for a p = 1. 0, a ^ = 0. 001, and a 
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Figure 7. - Risk curves. I - 6, p « 40, and i) - 23. 

wide variety of values of m as shown by figure 8(a). These results show that m = 2, 

3, 4, and 6 constitute an attractive set of m values. The selection of preferred 

values for such a set of m values was examined with results shown in figure 8(b). 
These results are consistent with those shown in figure 7(b) for r\ = 23; namely, the 
strategy of selecting at its smallest value (a^ = 0. 001) and controlling R^(r?) 
through the choice of m dominates the strategy of fixing m and controlling R^(r/) by 
selecting an a^. 

Results for rj = 33 are shown in figure 9(a) for a p = 1. 0, = 0. 001, and a wide 

variety of values of m. The results are similar to those of figures 7(a) and 8(a) in that 
the risk curve has a turning point at a critical value of m. For values of m in excess 
of the critical value, there is a sequence of m values along a lower branch of the loss 
curve, and such a sequence constitutes a set of preferred m values, where increasing 
the value of m within the preferred set reduces the value of R^??) at the cost of in- 
creased R 2 (X). Figure 9 shows that setting a f at 0. 001 and controlling R 1 (r?) through 
m is a preferred strategy for tj = 33, which is consistent with the results already cited 
for tj = 13 and tj = 23. 
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Type 2 risk; R 2 (A) 




From inspection of figures 7(a), 8(a), and 9(a), the preferred ranges of m are pre- 
sented in the following table: 


Number of 

Preferred range of 

null mean 

number of mean 

squares, 

squares pooled 

V 

before testing, 


m 

13 

2 to 6 

23 

6 to 21 

33 

6 to 30 


R 

The total results for 2 design experiments suggest that acceptable strategies con- 
sist of using m = 1, a-p = 1. 0 and values small enough to give satisfactory values 
of RjM. Values of m > 1 can be used, but if they are, then according to the preced- 
ing table they should be at least as large as about (1/5)?] and then «p = 1. 0, = 0. 001, 

and values of m large enough to give satisfactory values of R fo) should be used. 


Overall Procedure 

jO 

A reasonable strategy for the chain pooling ANOVA of a 2 design experiment 
where f = 4 or 5 might consist of performing a preliminary analysis with m = 1 
and whatever ofp value is shown by existing results (table III) to be preferred 
for the statistician's prior guess of the value of rj. (For SL = 6 and m = 1, the pre- 
ferred aip is equal to 1. 0. ) From the results of the chain pooling ANOVA with m = 1, 

TABLE IV. - RATIO OF m/ij FOR GIVEN tj 


AND PREFERRED m 


Number of 
treatment 
combina- 
tions, 

2 f 

Number of 
null mean 
squares, 
V 

Number of 
mean squares 
pooled before 
testing, 
m 

m/tj 

24 

4 

3 

0.75 

24 

6 

4 

.67 

24 

9 

6 

.67 

2 5 

6 

3 

.50 

2 5 

9 

5 

. 56 

25 

14 

7 

. 50 
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the statistician will have a posterior estimate 77 of 77 and should use this value of 77 
to pick a preferred value of m that will be greater than 1. 

A rule is needed for picking a value of m > 1 after 77 has been determined. The 
preferred values of m, as evidenced by figures 5 and 6 for £ = 4 and £ = 5, are pre- 
sented in table IV together with the associated ratios m/77. The modal value for £ = 4 
is 2/3, and the modal value for £ = 5 is 1/2. These values suggest the following total 
procedure: Perform the chain pooling ANOVA with m = 1 and the preferred for 
the guessed value of 17. The result will be an estimate 77 of 77. If £ = 4, let m be 
the integer nearest (2/3)77. K £ = 5, let m be the integer nearest (1/2)77. With this 
new value of m, perform a second ANOVA of the observations with a p =1.0. In the 
case of £ = 6, the ANOVA with m > 1 would be performed with o>p = 1. 0, = 0. 001, 

and a value of m based on 77, where the exact value of m is chosen to control R^). 

OPERATING CHARACTERISTICS 

The conservative strategy of using m = 1 has been shown to be a strategy that 
might well be used in a preliminary chain pooling ANOVA. The ANOVA with m = 1 
would then give an estimate 77 of 77 that leads to a larger value of m, perhaps 
m = (2/3)77. The main interest is in the operating characteristics for m > 1. Risk 
curves for preferred values of m and «p are illustrated in figures 8(a), 10, and 
11(a) and (b). These curves map out detailed values of R.^77) and R 2 (X) for a wide 
variety of values of 77 and X, but they do not give a clear picture of the response of 
R 2 (X) to 77 and X. The response of R 2 (X) to 77 and X is shown by plots of R 2 (X) 
against X for stated values of R^t?) and 77. The plots were obtained by reading curves 



Figure 10. - Risk curves, i-4, p *11, 77 - 4 , and m *3. 
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such as those in figures 8 (a), 10 , and 11 (a) and (b) for the given values of rj at particu- 
lar abscissa values; namely at R.^ 77 ) = 0. 05, 0. 10, and 0. 20. The values of R 2 (X) so 
read were plotted as functions of X, as illustrated in figure 12. This plot shows that 
for preferred m and for fixed R^), the values of R 2 (X) are strongly dependent 
on X and are only weakly dependent on 77 . This observation is the basis for using the 
symbol R 2 (X) to express the fact that R 2 is always a function of X but is mostly in- 
dependent of 77 . 

The general performance of the preferred strategies (m > 1) was illustrated by 
figure 12. This plot shows type 2 risks as a function of the average noncentrality param- 
eter X for stated levels of £, 77 , and the type 1 risks. These curves permit reading 
values of X for specified values of the type 2 risks. Values of X for values of R 2 (X) 
of 0. 05, 0. 10, and 0. 20 were read and plotted as functions of 77 for stated values of 
R^(t 7 ) and R 2 (X), as illustrated by figure 13. For preferred m, figure 13 shows that, 
at the stated risks R^) and U 2 (X), the detectable values of X generally decrease 
with increasing 77 . 


EFFICIENCY AS FUNCTION OF EXPERIMENT SIZE 

The purpose of the multiple decision procedure with respect to a 2^ design exper- 
iment is to detect the nonnull populations p in number. The quantity signaling that a 

1 2 

population is nonnull is the square of the effect parameter, p. ; and the relative strength, 

2 2 1 2 2 

or signal to noise ratio is pf /a . From equation (5), this ratio is also given by 9 . 

Because of equation (6), the average value of p? /cr was 9 . At given levels of risks, 
Rj(tj) and R 2 (X), the detection of the experiment is defined as the number of signals to 
be detected p divided by the ratio of the average relative signal to noise ft. The de- 
tection efficiency \fs is defined as the detection divided by the number of observations 


and from equation (12), 


*=P (15) 

X 

4 R 

The quantity 1 }/ thus provides a measure for comparing the efficiency of 2 , 2 , 
fi 

and 2 fractional factorial experiments with each other. The quantity X for given risk 
levels was plotted as curves, such as those of figure 13. The associated values of 77 
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have equivalent values of p according to p = 2^ - 77 - 1. Values of X read from such 
curves, together with the associated values of p, were substituted in equation (15) to 
produce the values of if/, the detection efficiency. 

4 B 

This efficiency was shown to increase very rapidly from 2 to 2 design experi- 
ments. One implication of this result is that the experimenter should include all con- 
ceivable variables in his first design with no intention of adding other variables to the 
investigation at a later date. Such a policy may result in a large number of test con- 
ditions, leading the experimenter to seek means of achieving economy. One obvious 
method of achieving economy is to avoid the use of replication, in which case the methods 
of chain pooling become essential for analyzing results. Another method of achieving 
economy is to use fractional replication, but fractional replication creates questions as 
to whether significant interactions are being excluded from the model. Furthermore, 
large designs may not be performable under homogeneous conditions, and blocked de- 
signs may be needed. If a severely fractionated design is performed, and the experi- 
menter wishes to augment the testing to evaluate additional interactions, special rela- 
tions must exist between the old and the new test conditions so that the newly evaluated 
interactions will not be confounded with block effects. Sequences of designs that satisfy 
these relations are given in reference 5. The sequences are such that observations 
from the first block can be used to estimate the coefficients of a simple model and then 
be retained and combined with observations from new blocks so that all acquired observa- 
tions are used cumulatively to estimate models of .successively greater generality. 


ROBUSTNESS 

The subject of robustness was investigated by generating variates without trans- 
formation from the rectangular to the approximate normal distribution. The operating 
characteristics of the chain pooling procedure for the pseudorectangular distribution 
were in good agreement (ref. 9) with those for the pseudonormal distributions. 


CHARTS FOR CONTROL OF TYPE 1 RISKS 

Classical hypothesis testing enables the statistician to test an alternative hypothesis 
against a null hypothesis with a predetermined bound on the probability of a type 1 error. 
This natural formulation of a decision procedure can sometimes be applied to multiple 
decision procedures. In the case of chain pooling, the analogous procedure of putting 
a predetermined bound on the type 1 risks is not completely feasible. Partial feasibility 
is illustrated by figure 14. The choice, for example, of a strategy (m, a^, a^) leads 
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m ■ 1. 

to one of the dashed curves. These curves are seen to be nearly vertical for X £ 64. 
Thus, the choice of a fixed strategy, for instance, (m, nip, a f ) = (1, 0. 25, 0. 01), would 
essentially fix R^(r?) at 0. 06 (fig. 14) provided that X > 64; however, for X < 64, 

R-^(rj) would increase with decreasing X. 

Figure 14 illustrates a situation where the type 1 losses are insensitive to X, if 
X is sufficiently large. The type 2 risks had previously been shown to be mainly a 
function of X. Figure 14 thus illustrates a situation where the type 1 risks are inde- 
pendent of the type 2 risks, if the type 2 risks are sufficiently small. This fact is 
further illustrated by figure 15(a). This figure was obtained by reading the risk curves 
of reference 9, as illustrated by figure 14, to obtain values of R^(??) that correspond 
to the specific type 2 risk levels of R 2 (X) = 0. 05, 0. 10, and 0. 20. The result is that 
the three curves for any given a ^ lie fairly close to one another independently of the 
three levels of R 2 (X). Figure 15(a) thereby provides the information by which an 
a priori guess of 77 can be used to select a value of that should control the level 
of Rj(? 7 ) fairly independently of X. 

Risk curves with m = 1 and SL = 5 are given in reference 9. These curves were 
read at values of R 2 (X) of 0. 05, 0. 10, and 0. 20 to produce the curves of figure 15(b). 
This figure also suggests that, with an a priori choice of 77 , the value of R 1 (??) would 
not be especially sensitive to R 2 (X) or to X. 

A similar reading of curves for SL ~ 6 , as given in reference 9, produced the 
curves of figure 15(c). The relation between R^(tj) and 77 is again seen to be essen- 
tially independent of R 2 (X). 

With m > 1, the control of R^(^) is much more difficult. For example, the lines 
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(b) J ■ 5, a p - 1. 0. 

Figure 16. - Type 1 risks with preferred m and a.. 


(c)i-6, Op-1.0, Of -0.001. 






of the constant a £ in figure 10 have no regions where a fixed strategy results in essen- 
tially constant Rj(tj). The same technique used in plotting figure 15 was used to plot 
figure 16 from data in reference 9 for strategies where both m and were at pre- 
ferred values. Here the values of are widely separated according to the value 

of R 2 (A). 

A rigid relation between the type 1 and type 2 risks was illustrated by figure 14 in 
the region of A > 64; that is, the choice of m, a!p, and essentially fixed the value 
of R^(??) independently of R 2 (A), as shown by the vertical direction of the curves of the 
constant a^. An elastic relation between the type 1 and type 2 risks is illustrated in 
figure 10, in that for any fixed strategy (m, a p , ar f ), Rj(??) varies with R 2 (A) in any 
range of A. An increase in the value of A will simultaneously reduce R^) and 
R 2 (A). In this situation, a statistician might desire some bound on the average prob- 
ability of failing to detect real effects. Assume that he has estimated 77 from an 
initial ANOVA with m = 1. He can now enter figure 16(a) with this estimate, 77, 
and with the desired type 2 loss bound (particular value of R 2 (A)). For such values of 
77 and R 2 (A), he can now choose a value of to give some desired R^). He can 
then say that if the average noncentrality parameter is large enough to hold the type 2 
risks to the chosen bound, the selected a j will be small enough to hold the type 1 risks 
to the chosen R.^77). If the average non centrality parameter is larger than what he had 
hoped for, both the type 2 and type 1 risks will be lower than specified. If the average 
noncentrality parameter is less than what would give small type 2 risks, he must accept 
increased type 1 risks. In any event, having chosen some strategy (m > 1, dp, a j), 
he must realize that it is an optimal strategy for some combination of the type 1 and 
type 2 risks, the only drawback being that, if the average noncentrality parameter is 
small enough to boost the type 2 risks, it will also boost the type 1 risks. 

This elasticity between the type 1 and type 2 risks with m > 1 could be considered 
an advantage over the strategy with m = 1; that is, if the average noncentrality param- 
eter is smaller than what the statistician had hoped for, then, with the elastic type 1 
risk, the type 2 risks will be smaller than they would have been if the type 1 risk had 
been rigidly controlled by the strategy of m = 1. (Compare figs. 10 and 14.) 


CONCLUDING REMARKS 

Factorial experiments are essential when interactions among the factors can be 
important. Such experiments might be performed without replication, if the experi- 
ments are expensive, as in alloy development, or in the destructive testing of structures, 
or where many variables are involved, such as high -temperature protective coating 
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research. Not replicating produces an economy but leads to a situation where there is 

no well-established strategy for analyzing the results. 

Some Monte Carlo investigations of varieties of chain pooling procedures for 2^, 

5 6 

2 , and 2 fractional factorial design experiments suggested that a good overall strat- 
egy consists of starting the analysis with just the single smallest mean square, and it 
is assumed to have come from a null population. Testing proceeds at some level dp 
in the order of increasing magnitude of the mean squares. Mean squares found not sig- 
nificant are pooled into the denominator of the test statistic. If a mean square is found 
significant at level a!p, testing begins with it at a more stringent level and pro- 
ceeds in the order of increasing magnitude of the mean squares. If a mean square is 
found significant at level a p it and all larger mean square are declared significant. 

This procedure gives an estimate rj of the number of null effects in the experiment. 

A second analysis of the data is then performed with the use of a test statistic with 
m > 1 smallest mean squares initially pooled into the denominator of the test statistic. 

For 2^ design experiments, a good choice of m is the integer nearest (2/3 )r/. For 
2 ® design experiments a good choice of m is the integer nearest (1/2)77. For 2® design 
experiments, a good strategy consists of choosing = 1. 0, = 0. 001, and then 

choosing m to control the type 1 risk. 

Curves are given so that some control over the average probability of type 1 error 

can be achieved through the choice of a f. 

The actual type 1 and type 2 error probabilities depend on the magnitudes of the 

* 

real effects, and curves are given for estimating weighted average error probabilities 
after the real effects have been estimated. 

The chain pooling procedure compared favorably with Daniel’s modulus ratio sta- 
tistic (half -normal plotting) for the case of just one real effect. 

Lewis Research Center, 

National Aeronautics and Space Administration, 

Cleveland, Ohio, July 6, 1967, 

129-03-01-03-22. 
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APPENDIX A 


CURVES FOR POSTERIOR ESTIMATION OF RISKS 

c 

The assumptions are now introduced that a 2 design experiment has been com- 
pleted, that a preliminary chain pooling ANOVA has been performed with m = 1 and 
with some a?p and and that a second ANOVA has been performed with some pre- 
ferred m > 1. The experimenter may be satisfied with the results, or he may have 
some questions such as 

(1) Should the ANOVA be performed at some other a 

(2) Should the conditions of the experiment be changed to provide larger values of 
the ? 

(3) Should more precise (and more costly) instrumentation be used? 

Partial answers to these questions may be obtained from posterior estimates of the 
type 1 and type 2 risks. The starting point of such estimates is the estimation of an 
average noncentrality parameter X. 

An average of the X^ as computed from equation (3), is 

X= - E X. 
pi=l 1 



per i=i 


Replacing the unknown parameters by quantities estimated from the experiment 
gives 


X = 



(Al) 


The multiple decision ANOVA procedure leads to the conclusion that t) mean squares 
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are free from real effects, and that p = 2^ - 77 - 1 mean squares contain real effects. 

The p effects concluded to be significant have estimators p, give by Yates’ algorithm. 

*•*2 * 

The estimate cr can be obtained from the procedure given by Wilk, Gnanadesikan 

and Freeny (ref. 3). Where 77 is regarded as the total sample size, the censored 

o 

sample consisting of the m smallest ordered contrasts is used to estimate cr . 

Information such as that presented in figures 8 (a), 10, and 11(a) and (b) is presented 
in a different form in figure 17. The statistician may enter figure 17 with (1) the value 
of i for the design of the experiment, ( 2 ) the value of ctj used in the decision making, 
(3) the value of 77 resulting from the decision procedure, and (4) the average value X 
computed as just described. Visual interpolation with respect to 77 and X should then 
result in a pair of values R ^( 77 ) and Rg(X) that would be the posterior risk estimates 
for the total procedure. In the event that the values of 77 and X did not fall within the 
ranges of existing curves in figure 17, the statistician could still notice whether his 
values of 77 and X suggested that the risks were less than, or greater than, some 
values for which curves had been drawn. Figure 17 was obtained from Monte Carlo 
computations where the true values of 77 and X were known. The statistician’s pos- 
terior estimates of R^r?) and R 2 (X) will be dependent on errors of estimate in 77 
and X. Furthermore, if the distribution of Xj in the real experiment was favorable to 
the decision procedure, the estimate of Rg(X) will be too high because figure 17 was 
based on unfavorable distributions of the X j. 

The value to the experimenter of making these posterior risk estimates is twofold: 

(1) For the strategy with m > 1, the type 1 risk is dependent on the value of X so that 
the selection of a strategy (m, ap, aj) might result in a true R^{t]) somewhat different 
from what the statistician had desired when he first selected a strategy according to 
figure 16. The posterior estimation of R.^ 77 ) might then tell the statistician to perform 
another analysis with a different value of a (2) The analysis with m > 1 is done 
with no knowledge of X and therefore with no attempt to balance the type 1 and type 2 
risks. In some cases, the experimenter may want to do some balancing. Suppose the 
value of X were small so that the posterior risk estimates were R-^) = 0. 20 and 
R 2 (X) = 0. 20. Then, further analysis would seem to be of no value. On the other hand, 
if X were large and the posterior risk estimates were, for example, Rj(? 7 ) = 0. 20 and 
r 2 (X) = 0. 02, the experimenter might desire another analysis (presumably with de- 
creased o!j or increased m) that would decrease R^(r]) at the expense of increasing 
R 2 (X). 


I 
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APPENDIX B 


ILLUSTRATIVE EXAMPLE - CO BALT- BASE- ALLOY DEVELOPMENT, 

ONE-HALF REPLICATE WITH FIVE FACTORS 

General Chain Pooling ANOVA Strategies 

The analysis is presented from two points of view. The first point of view, the 
significance point of view, assumes that the statistician must be conservative in drawing 
conclusions, and that the probability of making type 1 errors will be controlled. The 
second point of view, the screening point of view, assumes that all effects that might 
be real should be so identified. With this point of view, the statistician is particularly 
concerned about type 2 errors. 

In either case, the ANOVA proceeds in two analyses. The first analysis begins 
with the strategy m = 1 and leads to an estimate r\ of the number of null mean squares. 
Based on this estimate of rj, a second iteration of the ANOVA is performed with pre - 
f erred m > 1. 

The results of the Yates’ computation are given in table V. The responses are 
logarithms of stress -rupture times to failure of cast specimens. The effects to be 
estimated (table V) include 1 grand mean, 5 main effects, and 10 interactions. 

The strategy (m, a^, a^) is chosen according to information on the operating char- 
acteristics given for m = 1 by figure 15(a) and for preferred m > 1 by figure 16(a). 

The use of these curves requires that a prior estimate be made for 17. For this ex- 
ample, the number of null mean squares is assumed to be equal to the number of highest 
order interactions; namely, the assumption is 77 = 10. 

In addition to making a prior estimate of 77, the use of the curves of figures 15 
and 16 also requires that some assumptions be made about the value of RgW- Because 
the operating characteristics with m = 1 are not sensitive to the value of X, the curves 
of figure 15 are closely spaced with respect to RgW, and therefore a single arbitrary 
value (the middle curve for RgW = 0. 10) will be used. The operating characteristics 
are critically dependent on X or RgW when m > 1, as in figure 16. In using fig- 
ure 16, assumptions are made about Rg(X) according to the purpose of the ANOVA. 
Comparatively large type 2 risks should be anticipated with the significance point of 
view and for such a situation, the assumption is Rg(X) = 0. 20. With the screening point 
of view, comparatively small type 2 risks should be anticipated and Rg(X) = 0. 05 is 
assumed. These assumptions of risk levels of 0. 05, 0. 10, or 0. 20 are higher than the 
significance levels often used in statistical procedures. These high type 2 risk levels 
are regarded as being consistent with the election to use the economy model experiment 
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TABLE V. - ILLUSTRATIVE EXAMPLE OF ONE-HALF 2 S DESIGN EXPERIMENT 


h 3 

Treatment 

Responses, 

Parameters 

Parameter 

Mean 


Significance point of view 


levels 

n 


estimates, 3. 

squares. 

I First analysis with 

Second analysis with 






z i 

(m, a p , a 

f ) = 1, 0. 25, 0. 01 

(m, a p , Qf f ) = 5, 1. 0, 0. 01 







Uj(exper.) 

U j ( V 


Uj(exper.) 

W 


0 

(i) 

2.2715 


1.6522 








1 

ae 

2. 0708 

H 

0297 

0. 000079 







2 

be 

1.3745 


3833 

. 000103 

1.1319 

1.923 

1.99986 




3 

ab 

1.2458 

^12 

.0781 

. 004408 

b 2.8810 

2.527 

2. 9809 




4 

ce 

2.2810 

M 3 

.1002 

. 007567 

2.9295 






5 

ac 

2.0950 

^13 

-.0166 

, 014160 

2.9619 






6 

be 

1. 5207 

M 23 

-.0025 

. 018099 

2.9701 



2.4449 


5.31 

7 

abce 

1. 5290 

~ m 45 

-.0217 

. 020248 

2.9733 



2.6090 



8 

de 

1.8199 


-. 1464 

, 021949 

2.9753 



2. 7285 



9 

ad 

1. 5687 

m 14 

.0336 

, 028594 

c 2. 9810 



3. 1244 



10 

bd 

.7947 

M 2 4 

-. 0022 

. 032091 




3.2966 



11 

abde 

1.2701 

" m 35 

.0448 

. 043254 




3. 7303 



12 

cd 

2.2006 

M 34 

.0423 

. 097554 




4.7253 



13 

aede 

1.9759 

' m 25 

-. 0356 

. 160510 




5. 1548 



14 

bede 

1. 1924 

" m 15 

-. 0520 

.342750 




c 5. 5722 



15 

abed 

1. 2240 


-. 0370 

2.350200 








a (Xj are contrasts'iiivided by 2^. 
^Significant at level a p . 
c Significant at level a^. 


that has no replication. Of course, the parameters of the true physical situation could 
be such that these assumptions are much too large or much too small. Such assump- 
tions must be made so that a decision strategy can be selected that will be somewhat 
appropriate to the experimenter’s needs in the design and analysis of the experiment. 
When the analysis is completed, posterior estimates can be made of the risks R^(tj) 
and RgM. 

Because the type 2 risks should always be minimized, the largest a ^ value that 
will not result in too large a value of R^(t?) should be chosen. Since the operating char- 
acteristics with m = 1 are inferior to those with preferred m > 1 , the statistician 
should be less stringent about R ^( 77 ) when using m = 1 , than when using preferred 
m >1. Under the previously stated assumptions concerning RgW, the strategy 
(m, Op, Of) will be chosen from figure 15(a) and from 16(a) according to the resulting 
value of R^(t/). 


44 


Significance Point of View 


First analysis, m = 1: 

(1) Assume that 77 = 10 and RgW = 0. 10. 

(2) In figure 15(a) with 77 = 10 and Rg(X) = 0. 10, choose the highest a ^ that yields 
an acceptable Rj(t 7 ) and note the required ap. Figure 15(a) shows that if = 0. 01, 
R 1 ( 7 ?) is 0. 14 and is required to be 0. 25. 

(3) The strategy (m, Q!p, a^) = (1, 0. 25, 0. 01) and the chain pooling ANOVA re- 
sulted in 77 = 8 and p = 7. 

Second analysis, m ^ ( 2 / 3)77 = (2/3)8 = 5: 

(1) Assume that 77 = 77 = 8 and that RgW = 0. 20. 

(2) In figure 16(a) with 77 = 8 and RgW = 0. 20, choose the highest that yields 
an acceptable R^(tj) and note the required otp. Figure 16(a) shows that if = 0 . 01, 
the ^( 77 ) is 0 . 08, and Q!p is required to be 1 . 00 . 

(3) The strategy (m, ap, atf) = (5, 1. 00, 0. 01), and the chain pooling ANOVA re- 
sulted in 77 = 13 and p = 2. 


Screening Point of View 


First analysis, m = 1: 

(1) Assume that 77 = 10 and that R 2 M = 0. 10. 

(2) In figure 15(a) with 77 = 10 and R 2 (X) = 0. 10, choose the highest a f that yields 
an acceptable 14(77) an< * note the required a-p. Figure 15(a) shows that if = 0. 025, 
the 14(77) is 0* 24 and “p is required to be 0. 50. 

(3) The strategy (m, Q!p, o?j) = (1, 0. 50, 0. 025), and the chain pooling ANOVA re- 
sulted in 77 = 4 and p = 11. 

Second analysis, m = ( 2 / 3)77 = (2/3)4 = 3: 

(1) Assume that 77 = 4 and that RgW = 0 . 05. 

(2) In figure 16(a) with 77 = 4 and Rg(A) = 0. 05, choose the highest that yields 
an acceptable R ^( 77 ) and note the required oip. Figure 16(a) shows that if = 0. 05, 
the R x (t 7 ) is 0. 17 and oip is required to be 1. 00 . 

(3) The strategy (m, a-p, a f ) = (3, 1.00, 0.05), and the chain pooling ANOVA re- 
sulted in 77 = 11 and p = 4. 

In summary, the significance point of view with m = 1 resulted in p = 7, whereas 
the preferred m = 5 resulted in p = 2. The screening point of view with m = 1 re- 
sulted in p = 11, whereas the preferred m=3 resulted in p = 4. The need for going 
beyond m = 1 to the second iteration with m > 1 is obvious. The statistician might 
seek convergence of 77 by continuing with further iterations of the ANOVA, using new 
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values of m ^ (2/3)77, where 77 is determined by the immediately preceding iteration. 
The operating characteristics of such a procedure have not been investigated. 


Posterior Estimates of Risks 

The risk levels of figure 16 for preferred m > 1 are dependent on the value of 77 , 
which was only crudely estimated by the initial analysis with m = 1. Furthermore, the 
risk levels of figure 16 are dependent on the values of the X^, and no information on 
their values was used in entering figure 16. For these two reasons, a posterior esti- 
mate of R.^ 77 ) and Rg(X) is highly desirable. Such estimates will now be made for the 

second iteration under the screening point of view, 
o 

The quantity a is estimated in the manner of reference 3, which requires a de- 
cision to use some small number of the smallest mean squares. The value of m = 3 

2 

as used in the ANOVA is retained for this estimate of <7 . The equivalence between 
the notation of reference 3 and the present notation is 


Also, 
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TABLE VI. - ONE -HALF 2 5 DESIGN EXPERIMENT ON COBALT ALLOYS 


Anal- 

ysis 

Prior 
guess 
of num- 
ber of 
null 
mean 
squares 
V 

Prior 

type 

1 

risk, 

Blfa) 

Assumed 
type 2 
risk, 

R 2 (a) 

Number 
of mean 
squares 
pooled 
before 
testing, 
m 

Nominal 
size of 
prelim- 
inary 
pooling 
test, 

a P 

Nominal 
size of 
final 
signif i - 
cance 
test, 

a f 

Estimate 
of num- 
ber of 
null 
mean 
squares, 
V 

Estimate 
of num- 
ber of 
real 
effects, 
P 

Revised 

m 

Estimate 
of aver- 
age non- 
central- 
ity 

param- 

eter, 

X 

Posterior 
type 1 
risk, 
Rjfa) 

Posterior 
type 2 
risk, 
R a (A) 






Significance point of view 





First 

10 

0. 14 

0.10 

1 

0.25 

0 . 01 

8 

7 

5 










Sec- 

ond 

8 

.08 

.20 

i 

5 

1.00 

.01 

13 

2 

— 

23.6 

a 0. 07 

a 0. 07 

Screening point of view 

First 

10 

0. 24 

0. 10 

l 

0 . 50 

0. 025 

4 

11 

3 










Sec- 

ond 

4 

. 17 

.05 

3 

1.00 

.05 

11 

4 

— 

20 . 1 

0 . 18 

0.06 


a MInor extrapolation. 


The ANOVA results were rj = 11 and p = 4. The significant effects are therefore 
the four largest absolute values of the p^ (aside from £q) of table V 


Estimate 

Value 

M 2 

-0. 3833 

m 4 

1464 

h 

. 1002 

^12 

. 0781 


The mean noncentrality parameter is given by equation (Al): 


p 3=1+77 


2 l -l 


= 1® [(0. 3833) 2 + (0. 1464) 2 + (0. 1002) 2 + (0. 0781) 2 ] 

(4) (0. 0366) 

= 20. 164 
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The posterior risk estimates are obtained from figure 17(a). The strategy used 
q-j = 0. 05 to conclude that 77 = 11. From figure 17(a) with X = 20, 77 = 11, and 
= 0. 05, crude graphical extrapolation suggests that, roughly, R.^ 77 ) = 0. 18 and 
RJX) = 0 . 06. 

« 4 

The four analyses of the 2 experiment on cobalt base -alloys are summarized in 
table VI. In review, the analysis was begun from the significance point of view with 
m = 1 and with the guess of 77 = 10. Because the type 1 errors are insensitive to X 
with m = 1, an arbitrary value could be assumed for R 2 (X), and the value chosen was 
Rg(X) = 0. 10. Under these conditions, the choice of (m, otp, a f ) = (1, 0. 25, 0. 01) and 
the use of figure 15(a) gives the estimate R-^tj) = 0. 14. Performance of the analysis 
resulted in 77 = 8 and p = 7, and for this 2 ^ experiment, a preferred value of m was 
then chosen according to table Vas ms ( 2 / 3)77 = (2/3)(8) ^ 5. The second analysis 
with m = 5 was begun with a prior estimate of 77 from the first analysis, namely, 

?7 = 8 . From a significance point of view, the type 2 error should be expected to be high, 
and (consistent with this point of view) the type 2 risk was arbitrarily assumed to be 
large, namely, R 2 (X) = 0. 20. The decision procedure with (m, 0 !p, oif) = (5, 1. 00, 0. 01) 
then begins with Rj(tj) estimated from figure 16(a) as R^(t/) estimated from figure 16(a) 
as Rj(t7) = 0. 08. The procedure resulted in 77 = 13 and p = 2. The mean squares 
selected as being significant (aside from the grand mean) are the p = 2 largest mean 
squares. The posterior estimation of risks using figure 17(a) then leads to R^(rf) = 0. 07 
and R 2 (X) = 0. 07. Thus, the posterior estimate of R.^ 77 ) is fairly close to the initial 
control value of R^(t 7 ) = 0. 08; however, the posterior estimate of R 2 (X) = 0. 07 (which 
has now been based on p and X) is quite different from the arbitrary initial assumption 
of R 2 (X) = 0. 20 (which was chosen with no knowledge of X). 

As presented in table VI, the posterior risk estimates under the significance point 
of view were based on 77 = 13 and p = 2, whereas the screening point of view resulted 
in 77 = 11 and p = 4. Thus, the strategy used with the screening point of view has 
doubled the number of effects concluded to be significant. 

The posterior risks for the screening point of view, when compared with the sig- 
nificance point of view, showed that increasing R^(? 7 ) from 0. 07 to 0. 18 achieved a re- 
duction of R 2 (X) from 0. 07 to 0. 06. That such a small improvement in R 2 (X) occurred 
at such a large cost of might be associated with the fact that R 2 (X) was already 

at a point of diminishing returns in the analysis with the significance point of view. Of 
course, the important conclusion with respect to the experiment is that at least p = 2 
effects (aside from the grand mean) are clearly significant, whereas as many as p = 4 
effects are possibly significant. The reader may compare these conclusions with his 
interpretation of the half -normal plot, which is shown in figure 18. 


48 




O' .1 .2 .3 .4 

Estimate of regression coefficient 


Figure 18. - Half- normal plot for one- half 2 5 design experiment on 
cobalt-base alloys. 
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APPENDIX C 


ILLUSTRATIVE EXAMPLE - PENICILLIN PRODUCTION, 

FIVE-FACTOR EXPERIMENT IN TWO BLOCKS 

This experiment was first analyzed in reference 1. It was also used as an example 
of half -normal plotting in reference 2 and as an example of variance estimation in ref- 
erence 3. The results of a chain pooling ANOVA together with posterior estimates of 
type 1 and type 2 risks are given in table VII. These results will be compared, for equal 
levels of the type 1 risk, with results achieved in a manner similar to that of reference 1. 

TABLE VH. - 2 5 DESIGN EXPERIMENT ON PENICILLIN 


Anal- 

ysis 

Prior 
guess 
of num- 
ber of 
null 
mean 
squares, 
V 

Prior 
type 1 
risk, 

R^n) 

... 

Assumed 
type 2 
risk, 

r 2 w 

Nunjfcer 
of mean 
squares 
pooled 
before 
test, 
m 



Nominal 
size of 
prelim- 
inary 
pooling 
test, 

CL 

P 

Nominal 
size of 
final 
signifi- 
cance 
test, 

a f 

Esti- 
mate of 
number 
of null 
mean 
squares, 
V 

Esti- 
mate of 
number 
of real 
effects, 
P 

Re- 

vised 

m 

Esti- 
mate of 
average 
noncen- 
trality 
param- 
eter, 

X 

Poste- 
rior 
type 1 
risk, 
Kjfa) 

Poste - 
rior 
type 2 
risk, 

R 2 (x) 





Significance point of View 






First 

6 

0. 026 

0. 10 

1 

0. 50 

0. 005 

12 

^ 19 

1 6 

— 



Second 

12 

. ior. 

.20 

6 

1.00 

.001 

20 

11 

— 

21. 5 

a 0. 10 

a 0. 04 






Screening point of view 






First 

6 

0. 08 

0. 10 

1 

0.-50 

0. 01 

6 

25 

3 

— 



Second 

6 

.19 

.05 

3 

1.00 

.01 

12 

19 

— 

51.1 

b 0. 18 

b 0.012 


a Minor extrapolation. 
b Major extrapolation. 


As summarized in table VII, the analysis from a significance point of view with 

» A. 

preferred m = 6 led to the conclusions that 77 = 20 and p = 11. The posterior risks 
were estimated at = 0. 10 and R 2 (X) = 0. 04. The analysis from a screening point 

of view led to the conclusions that 77 = 12 and p = 19. The posterior estimates of risk 
were off the curves, but extrapolated values are (very roughly) R^(?j) = 0. 20 and 
R 2 (A) = 0. 01. The only meaning that can be attached to such extrapolated risk estimates 
is that no more than the 19 largest mean squares should be regarded as being significant. 
The significance point of view with posterior Rj(?j) = 0. 10 and Rg(X) = 0. 04 gave 
= 11, which seems to be a reasonable appraisal of the experiment. 
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Davies’ analysis (ref. 1, example 9. 2, pp. 383 to 387 and pp. 416 to 418) is an 
attempt to test significance at the 5 -percent level with the use of the three - and four- 
factor interactions (15 degrees of freedom) to estimate error variance. (The five -factor 
interaction was not pooled into the error estimate because it had been confounded with a 
block effect. ) Davies concluded that the significant effects were p^, p<j, Pg, and p^gj 
however, he felt that the conclusion with respect to p 3 g was only weakly supported by 
the present experiment and was mainly supported by prior experiments. Based on the 
pooled three- and four -factor interactions, the error variance was estimated at 

A o 

a = 0. 0034. The upper 0. 05 percent point of the F -distribution with 1 and 15 degrees 
of freedom is 4. 54, whereas the 0. 10 point is 3. 07. Therefore at the 0. 05 level, the 
mean squares that should be considered significant are all those larger than (0. 0034) 

(4. 54) = 0. 01544, whereas ait the 0. 10 level, mean squares should be considered signifi- 
cant if they exceed (0. 0034)(3. 07) = 0. 01044. At these levels (see table 9C. 1 of ref. 1) 
the significant effects are as follows: 

0. 05 Level: 0. 10 Level: 

^5 ^5 

Ml Mi 

m 3 m 3 

^35 m 35 

^12345 ( blocks ) ^12345 ( blocks ) 

^12 

^135 

M1234 

The use of the F-test with 1 and 15 degrees of freedom is equivalent to the use of 
the double tailed t-test with 15 degrees of freedom. In essence, 77 = 23 and p = 8 are 
conclusions of the double tailed t-test with a = 0. 10. Usage of the double tailed t-test 

A 

with a = 0. 10 is equivalent to testing the absolute values of the p^ using a single 

tailed t-test with a = 0. 05. For such a test, and with 15 degrees of freedom, tabulated 

data from reference 11 was used to plot the type 2 error curve, as shown by figure 19. 

This curve was then used with the estimated values of the eight largest p^ (aside from 

the grand mean) to compute a weighted average value of a posterior estimate of a type 2 

risk in accordance with equations (3), (5), ( 6 ), and (14). The result was R 9 (A) = 0. 060. 

/ " 2 
The chain pooling ANOVA and the use of the t-test with Davies’ estimate of a are 

then comparable at a = R-^tj) = 0. 10. The comparison is as follows: 
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Figure 19. - Type 2 error probability for single tail 
t-test of size a = 0. 05 with 15 degrees of freedom 
(ref. 11). 


Chain pooling t-Test 

^(77) = 0. 10 a = 0. 10 
m = 6 d.f. = 15 

77 = 20 77 = 23 

p= 11 p= 8 

R 2 (A) - 0. 04 R 2 (A) = 0. 060 

Thus with the same type 1 risk, the chain pooling has operated with a lower type 2 
risk and declared three more effects significant, as compared with the t-test. 
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